TO APPEAR IN IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 



I 



PAR-Aware Large-Scale Multi-User 
MIMO-OFDM Downlink 

Christoph Studer, Member, IEEE, and Erik G. Larsson, Senior Member, IEEE 



Abstract — We investigate an orthogonal frequency-division 
multiplexing (OFDM)-based downlink transmission scheme 
for large-scale multi-user (MU) multiple-input multiple-output 
(MIMO) wireless systems. The use of OFDM causes a high peak- 
to-average (power) ratio (PAR), which necessitates expensive and 
power-inefficient radio-frequency (RF) components at the base 
station. In this paper, we present a novel downlink transmission 
scheme, which exploits the massive degrees-of-freedom available 
in large-scale MU-MIMO-OFDM systems to achieve low PAR. 
Specifically, we propose to jointly perform MU precoding, OFDM 
modulation, and PAR reduction by solving a convex optimization 
problem. We develop a corresponding fast iterative truncation 
algorithm (FITRA) and show numerical results to demonstrate 
tremendous PAR-reduction capabilities. The significantly reduced 
linearity requirements eventually enable the use of low-cost RF 
components for the large-scale MTJ-MIMO-OFDM downlink. 

Index Terms — Multi-user wireless communication, multiple- 
input multiple-output (MIMO), orthogonal frequency-division 
multiplexing (OFDM), peak-to-average (power) ratio (PAR) re- 
duction, precoding, convex optimization. 

I. Introduction 

LARGE-SCALE multiple-input multiple-output (MIMO) 
wireless communication is a promising means to meet 
the growing demands for higher throughput and improved 
quality-of-service of next-generation multi-user (MU) wireless 
communication systems [2|. The vision is that a large number 
of antennas at the base-station (BS) would serve a large 
number of users concurrently and in the same frequency band, 
but with the number of BS antennas being much larger than 
the number of users |3|, say a hundred antennas serving ten 
users. Large-scale MIMO systems also have the potential to 
reduce the operational power consumption at the transmitter 
and enable the use of low-complexity schemes for suppressing 
MU interference (MUI). All these properties render large-scale 
MIMO a promising technology for next-generation wireless 
communication systems. 

While the theoretical aspects of large-scale MU-MIMO sys- 
tems have gained significant attention in the research commu- 
nity, e.g., [2|-[6|, much less is known about practical transmis- 
sion schemes. As pointed out in |7), practical implementations 
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of large-scale MIMO systems will require the use of low- 
cost and low-power radio-frequency (RF) components. To this 
end, reference [7] proposed a novel MU precoding scheme for 
frequency -flat channels, which relies on per-antenna constant- 
envelope (CE) transmission to enable efficient implementation 
using non-linear RF components. Moreover, the CE precoder 
of p\ forces the peak-to-average (power) ratio (PAR) to unity, 
which is not necessarily optimal as in practice there is always 
a trade-off between PAR, error-rate performance, and power- 
amplifier efficiency. 

Practical wireless channels typically exhibit frequency- 
selective fading and a low-PAR precoding solution suitable 
for such channels would be desirable. Preferably, the solution 
should be such that the complexity required in each (mobile) 
terminal is small (due to stringent area and power constraints), 
whereas heavier processing could be afforded at the BS. 
Orthogonal frequency-division multiplexing (OFDM) |8| is an 
attractive and well-established way of dealing with frequency- 
selective channels. In addition to simplifying the equalization 
at the receiver, OFDM also facilitates per-tone power and bit 
allocation, scheduling in the frequency domain, and spectrum 
shaping. However, OFDM is known to suffer from a high 
PAR |9|, which necessitates the use of linear RF components 
(e.g., power amplifiers) to avoid out-of-band radiation and 
signal distortions. Unfortunately, linear RF components are, 
in general, more costly and less power efficient than their 
non-linear counterparts, which would eventually result in 
exorbitant costs for large-scale BS implementations having 
hundreds of antennas. Therefore, it is of paramount importance 
to reduce the PAR of OFDM-based large-scale MU-MIMO 
systems to facilitate corresponding low-cost and low-power 
BS implementations. 

To combat the challenging linearity requirements of OFDM, 
a plethora of PAR-reduction schemes have been proposed 
for point-to-point single-antenna and MIMO wireless systems, 
e.g., p0|-p3|. For MU-MIMO systems, however, a straight- 
forward adaptation of these schemes is non-trivial, mainly 
because MU systems require the removal of MUI using a pre- 
coder [ 17 1 . PAR-reduction schemes suitable for the MU-MISO 
and MU-MIMO downlink were described in [18| and |19|, 
respectively, and rely on Tomlinson-Harashima precoding. 
Both schemes, however, require specialized signal processing 
in the (mobile) terminals (e.g., modulo reduction), which 
prevents their use in conventional MIMO-OFDM systems, 
such as IEEE 802.1 In (20). 
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A. Contributions 

In this paper, we develop a novel downlink transmission 
scheme for large-scale MU-MIMO-OFDM wireless systems, 
which only affects the signal processing at the BS while 
leaving the processing required at each terminal untouched. 
The key idea of the proposed scheme is to exploit the excess 
of degrees-of-freedom (DoF) offered by equipping the BS 
with a large number of antennas and to jointly perform MU 
precoding, OFDM modulation, and PAR reduction, referred to 
as PMP in the remainder of the paper. Our contributions can 
be summarized as follows: 

• We formulate PMP as a convex optimization problem, 
which jointly performs MU precoding, OFDM modula- 
tion, and PAR reduction at the BS. 

• We develop and analyze a novel optimization algorithm, 
referred to as fast iterative truncation algorithm (FITRA), 
which is able to find the solution to PMP efficiently 
for the (typically large) dimensions arising in large-scale 
MU-MIMO-OFDM systems. 

• We present numerical simulation results to demon- 
strate the capabilities of the proposed MU-MIMO-OFDM 
downlink transmission scheme. Specifically, we analyze 
the trade-offs between PAR, error-rate performance, and 
out-of-band radiation, and we present a comparison with 
conventional precoding schemes. 

B. Notation 

Lowercase boldface letters stand for column vectors and 
uppercase boldface letters designate matrices. For a ma- 
trix A, we denote its transpose, conjugate transpose, and 
largest singular value by A T , A H , and <r max (A), respectively; 
A' = A H (AA ff ) stands for the pseudo-inverse of A and 
the entry in the fcth row and £th column is [A]k.i- The M x M 
identity matrix is denoted by 1m, the M x N all-zeros matrix 
by Omxn, and Fm refers to the M x M discrete Fourier 
transform (DFT) matrix. The fcth entry of a vector a is desig- 
nated by [a]/-; the Euclidean (or £2) norm is denoted by ||a|| 2 , 
INI 00 = max fe|[ a ]fc| stands for the £oo-norm, and the £^5- 
norm pi] is defined as ||a||~ = max{||K{a}|| 00 , ll^a}!^} 
with x{a} and 3{a} representing the real and imaginary 
part of a, respectively. Sets are designated by upper-case 
calligraphic letters; the cardinality and complement of the 
set T is |T| and T c , respectively. For x E K we define 
[x] = max{ir, 0}. 

C. Outline of the Paper 

The remainder of the paper is organized as follows. Sec- 
tion [II] introduces the system model and summarizes important 
PAR-reduction concepts. The proposed downlink transmis- 
sion scheme is detailed in Section [III] and the fast iterative 



truncation algorithm (FITRA) is developed in Section IV 



Simulation results are presented in Section [V] and we conclude 
in Section |VI] 

II. Preliminaries 

We start by introducing the system model that is considered 
in the remainder of the paper. We then provide a brief 



overview of (linear) MU precoding schemes and, finally, we 
summarize the fundamental PAR issues arising in OFDM- 
based communication systems. 

A. System Model 

We consider an OFDM-based MU-MIMO downlink sce- 
nario as depicted in Fig. [T] The BS is assumed to have a 
significantly larger number of transmit antennas N than the 
number A/ C JV of independent terminals (users); each ter- 
minal is equipped with a single antenna only. The signal vector 
s w € O ai contains information for each of the M users, where 
w = 1, . . . , W indexes the OFDM tones, W corresponds to 
the total number of OFDM tones, O represents the set of scalar 
complex-valued constellations, and [s w ] m € O corresponds to 
the symbol at tone w to be transmitted to user We 
normalize the symbols to satisfy EjUs™]™,! 2 } = 1/M. To 
shape the spectrum of the transmitted signals, OFDM systems 
typically specify certain unused tones (e.g., at both ends of the 
spectrum |8l). Hence, we set s w = 0j\/ x i for w G T c where T 
designates the set of tones used for data transmission. 

In order to remove MUI, the signal vectors s w , Vw 
are passed through a precoder, which generates W vec- 
tors x m G C N according to a given precoding scheme (see 
Section 

p = E 



II-B I. Since precoding causes the transmit power 

V — 1 11 2 



to depend on the signals s^,, Vw and the 
channel state, we normalize the precoded vectors x w , Vw prior 
to transmission as 



w 



^tu II 2 ' 



1. 



.,w, 



(1) 



which ensures unit transmit power. We emphasize that this 
normalization is an essential step in practice (i.e., to meet 
regulatory power constraints). To simplify the presentation, 
however, the normalization is omitted in the description of 
the precoders to follow (but normalization employed in all 
simulation results shown in Section[V]i. Hence, in what follows 
x ffi and k w are treated interchangeably. 

The (normalized) vectors x w , Vw are then re-ordered (from 
user orientation to transmit-antenna orientation) according to 
the following one-to-one mapping: 



xi • •• x w = ai 



ajy 



(2) 



Here, the VF-dimensional vector a„ corresponds to the 
(frequency-domain) signal to be transmitted from the nth 
antenna. The time-domain samples are obtained by applying 



the inverse DFT (IDFT) according to a„ = F w &„ followed 
by parallel-to-serial (P/S) conversion. Prior to modulation and 
transmission over the wireless channel, a cyclic prefix (CP) is 
added to the (time-domain) samples a n , Vn to avoid ISI (SJ. 

To simplify the exposition, we specify the input-output 
relation of the wireless channel in the frequency domain only. 
Concretely, we consideij^] 



w 



1, 



(3) 



' For the sake of simplicity of exposition, we employ the same constellation 
for all users. An extension to the general case where different constellations 
are used by different users is straightforward. 

2 We assume perfect synchronization and a CP that is longer than the 
maximum excess delay of the frequency-selective channel. 
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Fig. 1. Large-scale MU-MIMO-OFDM downlink (left: BS with JV transmit antennas; right: M independent single-antenna terminals). The proposed downlink 
transmission scheme, referred to as PMP, combines MU precoding, OFDM modulation, and PAR reduction (highlighted by the dashed box in the BS). 



where y w denotes the wth receive vector, H w e qmxn 
represents the MIMO channel matrix associated with the 
wth OFDM tone, and n^, is an M-vector of i.i.d. complex 
Gaussian noise with zero-mean and variance N per entry. 
The average receive signal-to-noise-ratio (SNR) is defined by 
SNR = 1/No. Finally, each of the M user terminals per- 
forms OFDM demodulation to obtain the received (frequency- 
domain) signals [y m ] m , w = 1, . . . , W (see Fig. [TJ. 



B. MU Precoding Schemes 

In order to avoid MUI, precoding must be employed at the 
BS. To this end, we assume the channel matrices H w , Vw to be 
known perfectly at the transmit-sidej^] Linear precoding now 
amounts to transmitting x.^ = G u ,s ffi , where G w £ C NxM 
is a suitable precoding matrix. One of the most prominent 
precoding schemes is least-squares (LS) precoding (or linear 
zero-forcing precoding), which corresponds to G„, = Hj u . 
Since H^H^, = 1m, transmitting x.u, = H^s.^ perfectly 
removes all MUI, i.e., it transforms (|3]l into M independent 
single-stream systems y w = s w + n w . Note that LS precoding 
is equivalent to transmitting the solution x w to the following 
convex optimization problem: 



(LS) minimize ||x|| 2 subject to s v 



H„ 



This formulation inspired us to state the MU-MIMO-OFDM 
downlink transmission scheme proposed in Section III as a 
convex optimization problem. 

Several other linear precoding schemes have been pro- 
posed in the literature, such as matched-filter (MF) precoding, 
minimum-mean square-error (MMSE) precoding p7| , or more 
sophisticated non-linear schemes, such as dirty-paper cod- 
ing p2) . In the remainder of the paper, we will occasionally 
consider MF precoding, which corresponds to G w = H^f. 
Since H„,H^ is, in general, not a diagonal matrix, MF 
is normally unable to remove the MUI. Nevertheless, MF 
precoding was shown in (6| to be competitive for large-scale 
MIMO in some operating regimes and in (3) to perfectly 
remove MUI in the large-antenna limit, i.e., when N — > oo. 

3 In large-scale MU-MIMO systems, channel-state information at the trans- 
mitter would probably be acquired through pilot-based training in the uplink 
and by exploiting reciprocity of the wireless channel |2j, |3j. 



C. Peak-to-Average Ratio (PAR) 

The IDFT required at the transmitter causes the OFDM 
signals a„, Vn to exhibit a large dynamic range [8|. Such 
signals are susceptible to non-linear distortions (e.g., saturation 
or clipping) typically induced by real-world RF components. 
To avoid unwanted out-of-band radiation and signal distortions 
altogether, linear RF components and PAR-reduction schemes 
are key to successfully deploy OFDM in practical systems. 

1) PAR Definition: The dynamic range of the transmitted 
OFDM signals is typically characterized through the peak-to- 
average (power) ratio (PAR). Since many real-world RF-chain 
implementations process and modulate the real and imaginary 
part independently, we define the PAR at the nth transmit 
antenna as3 

2 



PAR„ = 



2W\\iL n 



(4) 



As a consequence of standard vector-norm relations, (|4]) sat- 
isfies 1 < PAR„ < 2W. Here, the upper bound corresponds 
to the worst-case PAR and is achieved for signals having only 
a single (real or imaginary) non-zero entry. The lower bound 
corresponds to the best case and is realized by transmit vectors 
whose (real and imaginary) entries have constant modulus. 
To minimize distortion due to hardware non-linearities, the 
transmit signals should have a PAR that is close to one; this 
can either be achieved by CE transmission J7| or by using 
sophisticated PAR-reduction schemes. 

2) PAR-Reduction Schemes for OFDM: Prominent PAR- 
reduction schemes for single-antenna communication sys- 
tems are selected mapping (SM) flO) , partial transmit se- 
quences fl"T) , active constellation extension (ACE) [12], and 
tone reservation (TR) (T3j, fl5) . PAR-reduction schemes for 
point-to-point MIMO systems mostly rely on SM or ACE and 
have been described in, e.g., (14], (T6). For the MU-MIMO 
downlink, a method relying on Tomlinson-Harashima precod- 
ing and lattice reduction has been introduced recently in fl9) ; 
this method, however, requires dedicated signal-processing 
algorithms at both ends of the wireless link (e.g., modulo 

4 Note that alternative PAR definitions exist in the literature, e.g., using 
the £oo-norm in the nominator instead of the ^jg-norm (and W instead of 
2W). Nevertheless, the relation lllanlj^, < ||a n ||~ < [lanll^, shown in 
|2 1 [ Eq. 12] ensures that reducing the PAR as defined in Q also reduces an 
tea -norm-based PAR definition (and vice versa). Moreover, the theory and 
algorithms presented in this paper can, for example, be formulated to directly 
reduce an too -norm-based PAR. 
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reduction in the receiver). In contrast, the transmission scheme 
developed next aims at reducing the PAR by only exploiting 
the excess of transmit antennas available at the BS. This 
approach has the key advantage of being transparent to the 
receivers, i.e., it does not require any special signal-processing 
algorithms in the (mobile) terminals. Hence, the proposed 
precoding scheme can be deployed in existing MIMO-OFDM 
systems for which channel-state information is available at the 
transmitter, such as IEEE 802.1 In J20|. 

III. Downlink Transmission Scheme 

The main idea of the downlink transmission scheme de- 
veloped next is to jointly perform MU precoding, OFDM 
modulation, and PAR reduction, by exploiting the DoF avail- 
able in large-scale MU-MIMO systems. To convey the basic 
idea and to characterize its fundamental properties, we start 
by considering a simplified MIMO system. We then present 
the MU-MIMO-OFDM downlink transmission scheme in full 
detail and conclude by discussing possible extensions. 

A. Basic Idea and Fundamental Properties 

To convey the main idea of the proposed precoding method, 
let us consider an OFDM-free (narrow-band, flat-channel) 
MU-MIMO system with the real-valued input-output relation 
y = Hx + n and an M x N channel matrix satisfying 
M < N . To eliminate MUI, the transmit-vector x must satisfy 
the precoding constraint s = Hx, which ensures that y = s+n 
when transmitting the vector x. Since M < N, the equation 
s = Hx is underdetermined; this implies that there are, in 
general, infinitely many solutions x satisfying the precoding 
constraint. Our hope is now to find a suitable vector x having 
a small dynamic range (or low PAR). 

A straightforward approach that reduces the dynamic range 
is to transmit the solution x of the following optimization 
problem: 



(P-DYN) 



minimize 

a,/3,x 



(3 



subject to s = Hx, (3 < |[x]»| < a,Vi. 



Unfortunately, the second constraint (3 < |[x]{| < a,Vi causes 
this problem to be non-convex and hence, finding the solution 
of (P-DYN) with efficient algorithms seems to be difficult. 

I ) Convex Relaxation: To arrive at an optimization problem 
that reduces the dynamic range and can be solved efficiently, 
we relax (P-DYN). Specifically, f3 < |[x]j| < a is replaced by 
|[x]i| < a, which leads to the following convex optimization 
problem: 

(P-INF) minimize HxH^ subject to s = Hx. 

X 

Intuitively, as (P-INF) minimizes the magnitude of the largest 
entry of x, we can expect that its solution x exhibits low PAR. 
In fact, (P-INF) has potentially smaller PAR than a transmit 
vector resulting from LS precoding. To show this, we note 
that llxH^ < ||Hts|| , where x is the minimizer of (P-INF) 
and H^s corresponds to the LS-precoded vector. Since H^s 
is the ^2-norm minimizer, we have ||H^s|| < ||x|L and, 



consequently, the PAR-levels of (P-INF) and of LS precoding 
satisfy 

iv||x|| 2 _ at||hV 2 



PARp 



< 



|Htt 



PAR 



LS. 



which implies that the PAR associated with (P-INF) cannot be 
larger than that of LS precoding. We confirm this observation 
in Section [V] where the proposed downlink transmission 
scheme is shown to achieve substantially lower PAR than for 
LS precoding. 

2) Benefits of Large-Scale MIMO: To characterize the 
benefit of having a large number of transmit antennas at the 
BS on the PAR when using (P-INF), we first restate a key 
result from |(23j. 

Proposition 1 (7 |25] Prop. 1]): Let H have full (column) 
rank and 1 < M < N. Generally, the solution x to (P-INF) 
has N — M + 1 entries with magnitude equal to ||x|| and 
the M — 1 remaining entries have smaller magnitude. 

With this proposition, we are able to derive the following 
upper bound on the PAR when performing precoding accord- 
ing to (P-INF): 

2 



PARp-INF 



jV||x|| 



< 



N 



N-M + l 



(5) 



Here, the following inequality is an immediate consequence 
of Proposition [Tj i.e., we have 

X X" 

>Y,\\nt = {N-M+i)\\nl , 

X 

where X is the set of indices associated with the N — M + 1 
entries of x for which |[x]j| = HxH^. It is now key to realize 
that for a constant number of users M and in the large-antenna 
limit N — > oo, the bound §5§ implies that PAR P _i NF — > 1. 
Hence, for systems having a significantly larger number of 
transmit antennas than users — as is the case for typical large- 
scale MU-MIMO systems (2), (3), J5J, (6)— a precoder that 
implements (P-INF) is able to achieve a PAR that is arbitrarily 
close to unity. This means that in the large-antenna limit of 
N — > oo, (P-INF) yields constant-envelope signals, while 
being able to perfectly eliminate the MUI. 

B. Joint Precoding, Modulation, and PAR Reduction (PMP) 

The application of (P-INF) to each time-domain sample 
after OFDM modulation would reduce the PAR but, unfor- 
tunately, would no longer allow the equalization of ISI using 
conventional OFDM demodulation. In fact, such a straightfor- 
ward PAR-reduction approach would necessitate the deploy- 
ment of sophisticated equalization schemes in each terminal. 
To enable the use of conventional OFDM demodulation in the 
receiver, we next formulate the convex optimization problem, 
which jointly performs MU precoding, OFDM modulation, 
and PAR reduction. 

We start by specifying the necessary constraints. In order to 
remove MUI, the following precoding constraints must hold: 



= H w x w , w e T. 



(6) 
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To ensure certain desirable spectral properties of the transmit- 
ted OFDM signals, the inactive OFDM tones (indexed by T c ) 
must satisfy the following shaping constraints: 







NX 



i = x w , w e T c 



(7) 



PAR reduction is achieved similarly to (P-INF), with the main 
difference that we want to minimize the ^55 -norm of the 
time-domain samples a„, Vn. In order to simplify notation, 
we define the (linear) mapping between the time-domain 
samples a„, Vn, and the wlh (frequency-domain) transmit 
vector x w as x w = f w (a.i, . . . , ajv), where the linear function 
fw(') applies the DFT according to a„ = Fwa„, Vn and 
performs the re-ordering defined in |2]). 

With |6]) and |7]), we are able to formulate the downlink 
transmission scheme as a convex optimization problem: 

{minimize max{||ai||~ , . . . , ||a^||~) 
ai,...,ajv 
subject to s w = Hwf w (a.x, a N ), w eT 
Ojvxi = /w(ai, . . . ,ajv), weT c . 

The vectors a n , Vn which minimize (PMP) correspond to 
the time-domain OFDM samples to be transmitted from each 
antenna. Following the reasoning of Section III-A we expect 
these vectors to have low PAR (see Section |V|for correspond- 
ing simulation results). In what follows, PMP refers to the 
general method of jointly performing precoding, modulation, 
and PAR reduction, whereas (PMP) refers to the actual opti- 
mization problem stated above. 

C. Relaxation of (PMP) 

The high dimensionality of (PMP) for large-scale MIMO 
systems necessitates corresponding efficient optimization al- 
gorithms. To this end, we relax the constraints of (PMP) to 
arrive at an optimization problem that can be solved efficiently 
using the algorithm developed in Section [TV] 

To simplify the notation, we aggregate all time-domain 



vectors in a = [a{ 



:T it 



and rewrite the constraints of 



(PMP) as a single linear system of equations. Specifically, both 
constraints in (PMP) can be rewritten as b = Ca, where the 
vector b is a concatenation of s w , w £ T and |T C | all-zeros 
vectors of dimension N; the matrix C implements the right- 
hand-side of the constraints d6j and Q, i.e., also includes the 
inverse Fourier transforms We can now re-state (PMP) in 
more compact form as 



(PMP) 



minimize a 



subject to b = Ca. 



In practice, it is desirable to relax the constraint b = Ca. 
Firstly, from an implementation point-of-view, relaxing the 
constraints in (PMP) enables us to develop an efficient algo- 



rithm (see Section IV 1. Secondly, in the medium-to-low SNR 
regime, the effect of thermal noise at the receiver is com- 
parable to that of MUI and out-of-band interference. Hence, 
relaxing the equation b = Ca to ||b — Ca|| 2 < n does not 
significantly degrade the performance for small values of 77. To 

5 For the sake of simplicity of exposition, the actual structural details of the 
matrix C are omitted. 



develop an efficient algorithm for the large dimensions faced 
in large-scale MU-MIMO-OFDM systems (see Section [TV} , 
we state a relaxed version of (PMP) in Lagrangian form as 

,2 
I2 ' 



(PMP-L) minimize A||a|| 



Ca 



where A > is a regularization parameter. Note that (PMP- 
L) is an ^55-norm regularized LS problem and A allows one 
to trade fidelity to the constraints with the amount of PAR 
reduction (similarly to the parameter 77); the associated trade- 



offs are investigated in Section V-D Note that the algorithm 



developed in Section IV operates on real-valued variables. 



To this end, (PMP) and (PMP-L) must be transformed into 
equivalent real-valued problems. This transformation, how- 
ever, is straightforward and we omit the details due to space 
limitations. 

D. Extensions of PMP 

The basic ideas behind PMP can be extended to several 
other scenarios. Corresponding examples are outlined in the 
next paragraphs. 

1) Emulating Other Linear Precoders: By replacing the 
precoding constraints in |6]) by 

H w P^s w — H w x Wl w £ / ) (8) 

where P w is an N X M precoding matrix of choice, one can 
generalize PMP to a variety of linear precoders. We emphasize 
that this generalization allows one to trade MUI removal with 
noise enhancement and could be used to take into account 
imperfect channel-state information at the transmitter, e.g., by 
using a minimum mean-square error precoder (see, e.g., fl7)). 

2) Peak-Power Constrained Optimization: Instead of nor- 
malizing the power of the transmitted vectors as in ([T}, one 
may want to impose a predefined upper bound P max on the 
transmit power already in the optimization problem. To this 
end, an additional constraint of the form ||a|| 2 < -Pmax could 
be added to (PMP), which ensures that — if a feasible solu- 
tion exists — the transmit power does not exceed P max . This 
constraint maintains the convexity of (PMP) but requires the 
development of a novel algorithm, as the algorithm proposed in 



Section IV is unable to consider such peak-power constraints. 

3) Combining PMP with Tone-Reservation (TR): In | |15) , 
the authors proposed to combine Kashin representations |23|, 
J24j with TR to reduce the PAR in OFDM-based communica- 
tion systems. The underlying idea is to obtain a time-domain 
signal that exhibits low PAR by exploiting the DoF offered by 
TR. We emphasize that PMP can easily be combined with TR, 
by removing certain precoding constraints (|5J. Specifically, 
only a subset Td C T is used for data transmission; the 
remaining tones 7~f are reserved for PAR reduction. This 
approach offers additional DoF and is, therefore, expected to 
further improve the PAR-reduction capabilities of PMP. 

4) Application to Point-to-Point MIMO Systems: The pro- 
posed transmission scheme can be used for point-to-point 
MIMO systems for which channel-state information is avail- 
able at the transmitter, e.g., IEEE 802.1 In [20]. In such 
systems, MUI does not need to be removed as the MIMO 
detector is able to separate the transmitted data streams; 
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hence, there is potentially more flexibility in the choice of the 
precoding matrices P„, Viu, as opposed to in a MU-MIMO 
scenario, which requires the removal of MUI. 

5) Application to Single-Carrier Systems: The idea of PMP, 
i.e., to simultaneously perform precoding, modulation, and 
PAR reduction, can also be adapted for single-carrier large- 
scale MIMO systems exhibiting ISI. To this end, one might 
want to replace the constraints in (P-INF) b^ 

Hi OjvfxJV • 
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and minimize the £^-norm of the vector x = [ ■ ■ ■ x^ 
which contains the PAR-reduced time-domain samples to be 
transmitted. The channel matrices H t are associated to the 
delay (or tap) t = 1 ,...,£), the information symbols are 
denoted by s q , q = 1, . . . , Q, and Q > D refers to the number 
of transmitted information symbols per block. Alternatively to 
PMP, the CE precoding scheme developed in (7) can also be 
used with the constraints given above. A detailed investigation 
of both transmission schemes is, however, left for future work. 



>T IT 



vector xo. The main ingredient of ISTA is the proximal map 
defined as [ 27 ] 

Pi(y) = arg min|. 9 (x) + ^ x- (V~^ V/i(y)^ j, (9) 

which constitutes the main iteration step defined as: 

x fe = p L (xfc_i), k = l,...,K. 

Here, K denotes the maximum number of iterations. We 
emphasize that (9]) has a simple closed-form solution for 
£i-norm regularized LS, leading to a low-complexity first- 
order algorithm, i.e., an algorithm requiring matrix-vector 
multiplications and simple shrinkage operations only. This 
property renders ISTA an attractive solution for PMP, as the 
involved matrices C and its adjoint C exhibit a structure that 



enables fast matrix-vector multiplication (see Section III-C I. 

2) Fast Version of ISTA: As detailed in (27), ISTA ex- 
hibits sub-linear convergence, i.e., F(xk) — -F(x*) ~ 0(l/k), 
where x* designates the optimal solution to (P). In order to 
improve the convergence rate, a fast version of ISTA, referred 
to as FISTA, was developed in (27). The main idea of FISTA 
is to evaluate the proximal map |9]) with a (linear) combination 
of the previous two points (xfc_i, Xfc_ 2 ) instead of xj._i only 
(see (27) for the details), which improves the convergence rate 
to F(x k ) - F(x*) ~ 0(l/k 2 ) and builds the foundation of 
the algorithm for solving (PMP-L) described next. 



IV. Fast Iterative Truncation Algorithm 

A common approach to solve optimization problems of the 
form (PMP) and (PMP-L) is to use interior-point methods (25) . 
Such methods, however, often result in prohibitively high 
computational complexity for the problem sizes faced in large- 
scale MIMO systems. Hence, to enable practical implementa- 
tion, more efficient algorithms are of paramount importance. 
While a large number of computationally efficient algorithms 
for the ^i-norm regularized LS problem have been developed 
in the compressive-sensing and sparse-signal recovery litera- 
ture, e.g., p6| , efficient solvers for the ^-norm regularized 
LS problem (PMP-L), however, seem to be missing. 

A. Summary of ISTA/FISTA 

In this section, we summarize the framework developed 
in (27) for ^i -norm-based LS, which builds the basis of the 



algorithm derived in Section IV-B for solving (PMP-L). 

1) ISTA: The goal of the iterative soft-thresholding algo- 
rithm (ISTA) developed in (27) is to compute the solution x 
to real-valued convex optimization problems of the form 

(P) minimize F(x) = g(x) + h(x), 

X 

where g(x) is a real- valued continuous convex function that is 
possibly non-smooth and h(x) is a smooth convex function, 
which is continuously differentiable with the Lipschitz con- 
stant L. The resulting algorithms are initialized by an arbitrary 

6 Note that the exact structure of the Toeplitz matrix depends on the pre- 
and post-ambles of the used block-transmission scheme. 



B. Fast Iterative Truncation Algorithm (FITRA) 

To simplify the derivation of the first-order algorithm for 
solving (PMP-L), we describe the algorithm for solving the 
Lagrangian variant of (P-INF) defined as follows: 



(P-INF-L) minimize A||x|| 



|s-Hx||2. 



First, we must compute the (smallest) Lipschitz constant L 
for the function h(x) = ||s — Hx^ and then, evaluate the 
proximal map (|9ji for the functions g(x) = AHxH^ and h(x). 

1 ) FITRA: The (smallest) Lipschitz constant of the gradient 
V/i(x) corresponds to L — 2cr^ ax (H), which can, for example, 
be calculated efficiently using the power method (28) . To 
compute the proximal map Q for (P-INF-L), we define the 
auxiliary vector 



1 Vfc(y) = y - T HT (Hy - 



x) 



which enables us to re-write the proximal map 
compact form as 

L 
f 2 



in more 



P-t(y) = arg mirJ A||x 



w 



-in contrast to 



(10) 

i-norm regularized 



Unfortunately, (lOi does 
LS — not have a simple closed-form solution for (P-INF-L). 
Nevertheless, standard algebraic manipulations enable us to 
evaluate the proximal map efficiently using the following two- 
step approach: First, we compute 



= arg min < Aa 



I N 

1=1 



(id 



C. STUDER AND E. G. LARSSON 



7 



Algorithm 1 Fast Iterative Truncation Algorithm (FITRA) 
l: initialize x <- Nxl , y x <- x , tx<-l,L<- 2cr£ ax (H) 

2: for k = 1,...,K do 

3: w <- y k - |H T (Hy fc - s) 

4: a <- arg min{A5 + f £^ ([|[w]i| - 5] + ) 2 } 

5: Xfe 4— truncQ,(w) 

6: < fc+ i <- 1(1 + ^1 + 4*2) 

7: yfe+1 <~ Xfe + ^^(Xfe - X fe _i) 

8: end for 
9: return x/< 



for which general-purpose scalar optimization algorithms, such 
as the bisection method (29), can be used. Then, we apply 
element-wise truncation (clipping) of w to the interval [—a, a] 
according to pi,(x) = trunc a (w). The truncation operator 
applied to the scalar i€tis defined as 

trunc Q (a;) = min{max{x, —a}, +a}. 

The resulting first-order algorithm, including the methods 
proposed in p7| to improve the convergence rate (compared 
to ISTA), is detailed in Algorithm [T] and referred to as the fast 
iterative truncation algorithm (FITRA). 

2) Convergence Rate: The following proposition is an 
immediate consequence of the convergence results for 
ISTA/FISTA in [27] Thm. 4.4] and characterizes the conver- 
gence rate of FITRA analytically. 

Proposition 2: The convergence rate of FITRA (as detailed 
in Algorithm [TJ) satisfies 

where x* denotes the solution to (P-INF-L), x^ is the FITRA 
estimate at iteration k, xo the initial value at iteration k = 0, 
and J F 1 (x) = A||x|| 00 + ||s-Hx||^. 

We emphasize that continuation strategies, e.g., J30) , po- 
tentially reduce the computational complexity of FITRA; the 
investigation of such methods is left for future work. 

C. Related Work 

An algorithm to compute an approximation to (P-INF) re- 
lying an iterative truncation procedure similar to FITRA was 
proposed in (24). The main differences between these algo- 
rithms are as follows: The algorithm in p4| requires the ma- 
trix H to be a tight frame and relies on a constant (and pre- 
defined) truncation parameter, which depends on H and cannot 
be computed efficiently. In the present application, however, 
the matrix H is, in general, not a tight frame and depends on 
the channel realization; this requires to chose the truncation 
parameter in (24) heuristically and hence, convergence of this 
method is no longer guaranteed. FITRA, in contrast, does not 
require the matrix H to be a tight frame, avoids manual tuning 
of the truncation parameter, and is guaranteed to converge to 
the solution of (P-INF-L). 



V. Simulation Results 

In this section, we demonstrate the efficacy of the proposed 
joint precoding, modulation, and PAR reduction approach, and 
provide a comparison to conventional MU precoding schemes. 

A. Simulation Parameters 

Unless explicitly stated otherwise, all simulation results are 
for a MU-MIMO-OFDM system having N = 100 antennas 
at the BS and serving M = 10 single-antenna terminals. We 
employ OFDM with W = 128 tones and use a spectral map T 
as specified in the 40MHz-mode of IEEE 802.1 In [20|[] We 
consider coded transmission, i.e., for each user, we indepen- 
dently encode 216 information bits using a convolutional code 
(rate-1/2, generator polynomials [133 D 171 D ], and constraint 
length 7), apply random interleaving (across OFDM tones), 
and map the coded bits to a 16-QAM constellation (using 
Gray labeling). 

To implement (PMP-L), we use FITRA as detailed in Al- 
gorithm [T] with a maximum number of K = 2000 iterations 
and a regularization parameter of A = 0.25. In addition to 
LS and MF precoding, we also consider the performance of 
a baseline precoding and PAR-reduction method. To this end, 
we employ LS precoding followed by truncation (clipping) 
of the entries of the time-domain samples a„, Vn. We use a 
clipping strategy where one can specify a target PAR, which 
is then used to compute a clipping level for which the PAR 
in Q of the resulting time-domain samples is no more than 
the chosen target PAR. 

The precoded and normalized vectors are then transmitted 
over a frequency-selective channel modeled as a tap-delay line 
with T = 4 taps. The time-domain channel matrices H t , t = 
1, . . . , T, that constitute the impulse response of the channel, 
have i.i.d. circularly symmetric Gaussian distributed entries 
with zero mean and unit variance. To detect the transmitted 
information bits, each user m performs soft-output demod- 
ulation of the received symbols [y w ] m , w = 1,...,W and 
applies a soft-input Viterbi decoder. 

B. Performance Measures 

To compare the PAR characteristics of different precoding 
schemes, we use the complementary cumulative distribution 
function (CCDF) defined as 

CCDF(PAR) = P{PAR„ > PAR}. 

We furthermore define the "PAR performance" as the maxi- 
mum PAR level PAR* that is met for 99% of all transmitted 
OFDM symbols, i.e., given by CCDF(PAR*) = 1%. The 
error-rate performance is measured by the average (across 
users) symbol-error rate (SER); a symbol is said to be in error 
if at least one of the information bits per received OFDM 
symbol is decoded in error. The "SNR operating point" cor- 
responds to the minimum SNR required to achieve 1% SER. 
In order to characterize the amount of signal power that is 

7 We solely consider \T\ = 108 data-carrying tones; the tones reserved for 
pilot symbols in IEEE 802.1 In [20| are ignored in all simulations. 
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Fig. 2. Time/frequency representation for different precoding schemes. The target PAR for LS+clip is 4dB and A = 0.25 for PMP relying on FITRA. 
(a) Time-domain signals (PAR: LS = 10.4dB, LS+clip = 4.0dB, MF = lO.ldB, and PMP = 1.9dB). Note that PMP generates a time-domain signal of 
substantially smaller PAR than LS and MF. (b) Frequency-domain signals (OBR: LS = -oo dB, LS+clip = -11.9dB.MF= -oodB, andPMP= -52.9dB). 
Note that LS, MF, and PMP preserve the spectral properties. LS+clip suffers from substantial OBR (visible at both ends of the spectrum). 
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Fig. 3. PAR and SER performance for various precoding schemes. The target PAR for LS+clip is 4dB and A = 0.25 for PMP relying on FITRA. (a) 
PAR performance (the curves of LS and MF overlap). Note that PMP effectively reduces the PAR compared to LS and MF precoding. (b) Symbol error-rate 
(SER) performance. Note that the signal normalization causes 1 dB SNR-performance loss for PMP compared to LS precoding. The loss of MF is caused by 
residual MUI; the loss of LS+clip is caused by normalization and residual MUI. 



transmitted outside the active tones T, we define the out-of- 
band (power) ratio (OBR) as follows: 



OBR 



Note that for LS and MF precoding, we have OBR = 0, as 
they operate independently on each of the W tones; for PMP 
or LS followed by clipping, we have OBR > in general. 

C. Summary of PMP Properties 

Figures [2] and [3] summarize the key characteristics of PMP 
and compare its PAR-reduction capabilities and error-rate per- 



formance to those of LS and MF precoding, as well as to 
LS precoding followed by clipping (denoted by "LS+clip" 



in the following). Fig. 2(a) shows the real part of a time- 
domain signal ai for all precoding schemes (the imaginary 
part behaves similarly). Clearly, PMP results in time-domain 
signals having a significantly smaller PAR than that of LS 
and MF; for LS+clip the target PAR corresponds to 4dB. The 



frequency-domain results shown in Fig. 2(b) confirm that LS, 
MF, and PMP maintain the spectral constraints. For LS+clip, 
however, the OBR is — 11.9dB, which is a result of ignor- 
ing the spectral constraints (see the non-zero OFDM tones 
at both ends of the spectrum in Fig. |2(b)| i. Fig. |3(a)| shows 
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Fig. 4. SNR, PAR, and OBR performance trade-offs of PMR The numbers next to the trade-off curve for FITRA correspond to the regularization parameter A 
used in (PMP-L). The LS+clip curves are parametrized by the target PAR in dB. (a) PAR/SNR trade-off (parts of the FITRA curves overlap), (b) OBR/SNR 
trade-off (all curves labeled with K correspond to FITRA). 



the PAR-performance characteristics for all considered precod- 
ing schemes. One can immediately see that PMP reduces the 
PAR by more than lldB compared to LS and MF precoding 
(at CCDF(PAR) = 1%); as expected, LS+clip achieves 4dB 
PAR deterministically. In order to maintain a constant transmit 
power, the signals resulting from PMP require a stronger nor- 
malization (roughly 1 dB) than the signals from LS precoding; 
this behavior causes the SNR-performance loss compared to 
LS (see Fig. |3(b)| i. The performance loss of MF and LS+clip 
is mainly caused by residual MUI. 

D. SNR, PAR, and OBR Trade-Offs 

As observed in Fig. [3] PMP is able to significantly reduce 
the PAR but results in an SNR-performance loss compared to 
LS precoding. Hence, there exists a trade-off between PAR and 
SER, which can be controlled by the regularization parameter 
A of (PMP-L). Fig. |4(a)| characterizes this trade-off for A = 2" 
with v e { — 12,..., 4}. In addition to the performance of 
LS and MF precoding, we show the behavior of LS+clip for 
various target-PAR values. 

Fig. |4(a)| shows that PMP is able to cover a large trade- 
off region that can be tuned by the regularization parame- 
ter A of (PMP-L). In particular, for a given number of FITRA 
iterations K = 2000, decreasing A approaches the perfor- 
mance of LS precoding — increasing A reduces the PAR but 
results in a graceful degradation of the SNR operating pointj^] 
Hence, (PMP-L) allows one to adjust the PAR to the linearity 
properties of the RF components, while keeping the resulting 



SNR-performance loss at a minimum. As shown in Fig. 4(a) 



LS+clip achieves a similar trade-off characteristic as PMP; for 
less aggressive values of the target PAR, LS+clip even seems 
to outperform PMP. 



It is important to realize that even if LS+clip outperforms 
PMP in terms of the PAR/SNR trade-off in the high-PAR 
regime, LS+clip results in substantial out-of-band interference; 
this important drawback is a result of ignoring the shaping 
constraints In particular, we can observe from Fig. 4(b) 
that reducing the PAR for LS+clip quickly results in signifi- 
cant OBR, which renders this scheme useless in practice. By 
way of contrast, the OBR of PMP is significantly lower and 
degrades gracefully when lowering the PAR. Furthermore, we 
see that reducing the maximum number of FITRA iterations K 
increases the OBR. Hence, the regularization parameter A to- 
gether with the maximum number of FITRA iterations K 
determine the PAR, OBR, and SNR performance of PMP. We 
finally note that for K = 2000 the computational complex- 
ity of FITRA is one-to-two orders of magnitude larger than 
that of LS precoding. The underlying reason is the fact that 
LS precoding solves N independent problems, whereas PMP 
requires the solution to a joint optimization problem among 
all N transmit antennas. 



E. Impact of Antenna Configuration and Channel Taps 

We finally investigate the impact of the antenna configura- 
tion to the PAR performance of PMP and LS precoding. To 
illustrate the impact of the channel model, we also vary the 
number of non-zero channel taps T e {2,4,8}. Fig. [5] shows 
that increasing the number of transmit antennas yields im- 
proved PAR performance for PMP; this behavior was predicted 
analytically in ([5]) for the (narrow-band) system considered in 
Section III-A2| Increasing the number of channel taps T also 



For A > 0, a small SNR gap remains; for A 
corresponds to LS precoding and the gap vanishes. 



0. however. (PMP-L) 



has a beneficial impact on the PAR if using PMP. An intuitive 
explanation for this behavior is that having a large number of 
taps increases the number of DoF, which can then be exploited 
by PMP to reduce the PAR. For LS precoding, however, the re- 
sulting PAR is virtually independent of the number of channel 
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Fig. 5. PAR performance of PMP and LS precoding depending on the 
number of transmit antennas TV and the number of non-zero channel taps T; 
the number of users M = 10 is held constant and A = 0.25 for PMP relying 
on FITRA. (The curves for LS precoding overlap.) 



tapsQ In summary, PMP is suitable for MU-MIMO systems 
offering a large number of DoF, but also enables substantial 
PAR reduction for small-scale MIMO systems and channels 
offering only a small amount of frequency-diversity. 

VI. Conclusions and outlook 

The proposed joint precoding, modulation, and PAR re- 
duction framework, referred to as PMP, facilitates an explicit 
trade-off between PAR, SNR performance, and out-of-band 
interference for the large-scale MU-MIMO-OFDM downlink. 
As for the constant-envelope precoder in |7), the fundamental 
motivation of PMP is the large number of DoF offered by sys- 
tems where the number of BS antennas is much larger than the 
number of terminals (users). Essentially, the downlink channel 
matrix has a high-dimensional null-space, which enables us 
to design transmit signals with "hardware-friendly" proper- 
ties, such as low PAR. In particular, PMP yields per-antenna 
constant-envelope OFDM signals in the large-antenna limit, 
i.e., for N — > oo. PMP is formulated as a convex optimization 
problem for which a novel efficient numerical technique, called 
the fast iterative truncation algorithm (FITRA), was devised. 

Numerical experiments showed that PMP is able to reduce 
the PAR by more than 1 1 dB compared to conventional precod- 
ing methods, without creating significant out-of-band interfer- 
ence; this substantially alleviates the linearity requirements of 
the radio-frequency (RF) components. Furthermore, PMP only 
affects the signal processing at the BS and can therefore be 
deployed in existing MIMO-OFDM wireless communication 
systems, such as IEEE 802.1 In pO") . 

In addition to the extensions outlined in Section |III-D| 
there are many possibilities for future work. Analytical PAR- 
performance guarantees of PMP are missing; the development 
of such results is a challenging open research topic. Moreover, 

9 MF and LS+clip exhibit the same behavior; the corresponding curves are 
omitted in 



Fig. [5] 



a detailed analysis of the impact of imperfect channel state 
information on the performance of PMP is left for future work. 
Finally, reducing the computational complexity of FITRA, 
e.g., using continuation [30], is part of ongoing work. 
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